{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MTCNN Training - Multi-Task Cascaded Convolutional Networks\n",
    "\n",
    "As a series of tutorials on the most popular deep learning algorithms for new-entry deep learning research engineers, MTCNN has been widely adopted in industry for human face detection task which is an essential step for subsquential face recognition and facial expression analysis. This tutorial is designed to explain how to train the algorithm for face detection task.\n",
    "\n",
    "## Training data preparing \n",
    "\n",
    "MTCNN is leveraged to perform three tasks - face/non-face classification, bounding box regression and facial landmark localization. [WIDER FACE](http://shuoyang1213.me/WIDERFACE/) (left figure) face detection data could be used for face classification and bounding box regression. [CNN_FacePoint](http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm) (right figure) is then used for facial landmark localization. \n",
    "\n",
    "<table><tr>\n",
    "<td> <img src=\"images/ipy_pic/wider_face.jpeg\"  width=\"400\"> </td>\n",
    "<td> <img src=\"images/ipy_pic/CNN.jpg\"  width=\"400\" > </td>\n",
    "</tr></table>\n",
    "\n",
    "Since different tasks are applied for each network, we need to prepare separate datesets and annotation for each task and CNN. The strategy here is to prepare image patches (either randomly cropped or using generated candidate windows from upper CNN) from wider_face and CNN_facepoint datasets. The following datasets are needed to be prepared。\n",
    "\n",
    "* Negative face samples: assigned with label 0 if Intersection-over-Union (IoU) < 0.3 to any ground-truth faces.\n",
    "* Positive face samples: assigned with label 1 if IoU > 0.65 to any ground-truth faces. \n",
    "* Part face samples: assigned with label -1 if IoU between 0.4 and 0.65 to any ground-truth faces.\n",
    "* Landmark samples: assigned with label -2 if IoU > 0.65 with 5 landmarks' positions. \n",
    "\n",
    "Negative and positive face samples are used for face classification. Positive and part face samples are for bounding box regression. Landmark samples are for facial landmark localization. The dataset lengths of four categories are prepared to be around 3 (negative):1 (positive):1 (part):2 (landmark). \n",
    "\n",
    "The overall training data preparing process can be demostrated as below figure. The annotation data file lists the image path, label, box offset, landmark offset.\n",
    "\n",
    "* Pnet: randomly crop image patches from WIDER_FACE datasets to collect positives, negatives and part faces \n",
    "* Rnet: use Pnet to generate candidate windows from WIDER_FACE datasets. crop, assign and collect corresponding datasets  \n",
    "* Onet: similar to Rnet, but use Pnet and Rnet together to generate candidate windows. Randomly crop image patches from CNN_FacePoint to collect landmark datasets \n",
    "\n",
    "<img src=\"images/ipy_pic/training.png\"  width=\"1000\" style=\"float: left;\">\n",
    "\n",
    "## loss function \n",
    "\n",
    "The loss function for each task is summarized as below:\n",
    "\n",
    "1. face classification: cross entropy loss is applied to formulate a two-class classification problem. p denotes the probability producted by softmaxing the network outputs. y in {0, 1} denotes the ground-truth label.\n",
    "\n",
    "2. bounding box regression: normalied and represent the offsets of candidate window coordinates. Euclidean loss is applied between the network out and ground-truth coordinates.\n",
    "\n",
    "3. facial landmark localization: normalied and represent the offsets corresponding to the candidate window coordinates. Euclidean loss is applied between the network out and ground-truth coordinates.\n",
    "\n",
    "The overall loss turns out to be a summary of each loss item. α denotes the weighting factors. It controls the importance of certain loss item. Users can fine tune the hyperparameters to drive a higher accuracy. For example, one may want to increase the α value for landmark term if face landmark localization accuracy brings some trouble. \n",
    "\n",
    "<img src=\"images/ipy_pic/equation.jpg\"  width=\"350\" style=\"float: left;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MTCNN Model Training Pipeline\n",
    "\n",
    "The following sections are described as data preparation and network training in a step-by-step fasion "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  1. Prepare Wider_Face annotation file \n",
    "\n",
    "Create a \"data_set\" folder to store all wider_face and CNN_facepoints data. The original wider face annotation file is matlab format. Let's transfer to .txt file and store them as anno_train.txt and anno_val.txt for training and validation respectively. "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python data_preprocessing/transform.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The anno_train.txt is exampled as shown below. Each line lists a image path and the bounding box cooridnates depending on the number of faces.\n",
    "\n",
    "<img src=\"images/ipy_pic/anno_train.png\"  width=\"900\" style=\"float: left;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  2. Generate PNet train data and annotation file \n",
    "\n",
    "The Pnet train data and annotation file are generated by randomly cropping the wider_face images. The generated data for positive, negative and part is stored in \"data_set/train/12\" with positive, part, negative folders.  Annotation files for positive, part and negative are generated in \"data_preprocessing/anno_store\" folder, naming pos_12.txt, part_12.txt, neg_12.txt. "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python data_preprocessing/gen_Pnet_train_data.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "assemble the PNet annotation file and shuffle it"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python data_preprocessing/assemble_Pnet_imglist.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can repeat the same procedure for validation data and annotation generation. The \"data_set/val/12\" can be created to contrain all the data. The annotation files can be named as pos_12_val.txt, part_12_val.txt, neg_12_val.txt. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. Train PNet model\n",
    "\n",
    "The training of MTCNN model needs to be separately and cascadly performed. The Pnet will need to be firstly trained and will be used to generate training data for downstream Rnet. This process will be repeated for Onet. Please be noted that training MTCNN here requires both training and validation data. The validation data will help pick up the lowest loss weighting parameters.  "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python train/Train_Pnet.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. Generate RNet train data and annotation file \n",
    "\n",
    "The Rnet train data and annotation file are prepared using the generated candidate images patches from Pnet. The image patches are cropped and resized as 24x24. The generated data for positive, negative and part is stored in \"data_set/train/24\" with positive, part, negative folders. Annotation files for positive, part and negative are generated in \"data_preprocessing/anno_store\" folder, naming pos_24.txt, part_24.txt, neg_24.txt. The validation data can be generated using the same procedure."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python data_preprocessing/gen_Rnet_train_data.py\n",
    "python data_preprocessing/assemble_Rnet_imglist.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Train RNet model"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python train/Train_Rnet.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6. Generate ONet train data and annotation file \n",
    "\n",
    "Similar to RNet data generation, the positive, part, negative data is stored in \"data_set/train/48\" with corresponding annotation file - pos_48.txt, part_48.txt, neg_48.txt. Follow the same procedure for validation "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python data_preprocessing/gen_Onet_train_data.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Landmark data is generated using CNN_FacePoint. Excuate the below command to collect data in folder \"data_set/train/48/landmark with corresponding annotation file - data_preprocessing/anno_store/landmark_48.txt. Follow the same procedure for validation "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python data_preprocessing/gen_landmark_48.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "assemble the ONet annotation file and shuffle it"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python data_preprocessing/assemble_Onet_imglist.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7. Train ONet model  "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "python train/Train_Onet.py"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
